Modeling and Analyzing Systems with Redundancy
نویسندگان
چکیده
Reducing latency is a primary concern in computer systems. As cloud computing and resource sharing become more prevalent, the problem of how to reduce latency becomes more challenging because there is a high degree of variability in server speeds. Recent computer systems research has shown that the same job can take 12 times or even 27 times longer to run on one machine than another, due to varying background load, garbage collection, network contention, and other factors. This server variability is transient and unpredictable, making it hard to know how long a job will take to run on any given server, and therefore how best to dispatch and schedule jobs. An increasingly popular strategy for combating server variability is redundancy. The idea is to create multiple copies of the same job, dispatch these copies to different servers, and wait for the first copy to complete service. A great deal of empirical computer systems research has demonstrated the benefits of redundancy: using redundancy can yield up to a 50% reduction in mean response time. Unfortunately, there is very little theoretical work analyzing performance in systems with redundancy. This thesis presents the first exact analysis of response time in systems with redundancy. We begin in the Independent Runtimes (IR) model, in which a job’s service times (runtimes) are assumed to be independent across servers. Here we derive exact expressions for the distribution of response time in a certain set of classbased redundancy systems. We also propose two new scheduling policies, Least Redundant First (LRF) and Primaries First (PF), and prove that LRF minimizes overall system response time, while PF is fair across classes of jobs with different redundancy degrees. While the IR model is appropriate in certain settings, in others it does not make sense because the independence assumption eliminates any notion of an “inherent job size.” The IR model leads to the conclusion that more redundancy is always better, which often is not true in practice. Therefore we propose the S&X model, which is the first model to decouple a job’s inherent size (X) from the server slowdown (S). This model is important because, unlike prior models, it allows a job’s runtimes to be correlated across servers. The S&X model makes it evident that redundancy does not always help: in fact, too much redundancy can lead to instability. To overcome this, we propose a new dispatching policy, Redundant-to-Idle-Queue, which is provably stable in the S&X model, while offering substantial response time improvements compared to systems without redundancy
منابع مشابه
Cold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution
In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...
متن کاملReliability Optimization for Complicated Systems with a Choice of Redundancy Strategies (TECHNICAL NOTE)
Redundancy allocation is one of the common techniques to increase the reliability of the bridge systems. Many studies on the general redundancy allocation problems assume that the redundancy strategy for each subsystem is predetermined and fixed. In general, active redundancy has received more attention in the past. However, in real world, a particular system design contains both active and col...
متن کاملRedundancy allocation problem for k-out-of-n systems with a choice of redundancy strategies
To increase the reliability of a specific system, using redundant components is a common method which is called redundancy allocation problem (RAP). Some of the RAP studies have focused on k-out-of-n systems. However, all of these studies assumed predetermined active or standby strategies for each subsystem. In this paper, for the first time, we propose a k-out-of-<em...
متن کاملUsing NSGA II Algorithm for a Three Objectives Redundancy Allocation Problem with k-out-of-n Sub-Systems
in the new production systems, finding a way to improving the product and system reliability in design is a very important. The reliability of the products and systems may improve using different methods. One of this methods is redundancy allocation problem. In this problem by adding redundant component to sub-systems under some constraints, the reliability improved. In this paper we worked on ...
متن کاملSet a bi-objective redundancy allocation model to optimize the reliability and cost of the Series-parallel systems using NSGA II problem
With the huge global and wide range of attention placed upon quality, promoting and optimize the reliability of the products during the design process has turned out to be a high priority. In this study, the researcher have adopted one of the existing models in the reliability science and propose a bi-objective model for redundancy allocation in the series-parallel systems in accordance with th...
متن کاملRedundancy Allocation Combined with Supplier Selection for Design of Series-parallel Systems
In this paper a redundancy allocation problem is studied where for the first time the supplier selection is taken into consideration and redundant components are provided from appropriate suppliers with the most suitable offers such as discount on buying price of components, warranty length for components, things like that, so that the system reliability, profit and the warranty length proposed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017